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Abstract 

We introduce a new non-smooth variational model for the restoration of manifold-valued 
data which includes second order differences in the regularization term. While such 
models were successfully applied for real-valued images, we introduce the second order 
difference and the corresponding variational models for manifold data, which up to now 
only existed for cyclic data. The approach requires a combination of techniques from 
numerical analysis, convex optimization and differential geometry. First, we establish 
a suitable definition of absolute second order differences for signals and images with 
values in a manifold. Employing this definition, we introduce a variational denoising 
model based on first and second order differences in the manifold setup. In order to 
minimize the corresponding functional, we develop an algorithm using an inexact cyclic 
proximal point algorithm. We propose an efficient strategy for the computation of the 
corresponding proximal mappings in symmetric spaces utilizing the machinery of Jacobi 
fields. For the n-sphere and the manifold of symmetric positive definite matrices, we 
demonstrate the performance of our algorithm in practice. We prove the convergence 
of the proposed exact and inexact variant of the cyclic proximal point algorithm in 
Hadamard spaces. These results which are of interest on its own include, e.g., the 
manifold of symmetric positive definite matrices. 

Keywords, manifold-valued data, second order differences, TV-like methods on manifolds, non¬ 
smooth variational methods, Jacobi fields, Hadamard spaces, proximal mappings, DT-MRI 


1. Introduction 

In this paper, we introduce a non-smooth variational model for the restoration of manifold¬ 
valued images using first and second order differences. The model can be seen as a second 
order generalization of the Rudin-Osher-Fatemi (ROF) functional [60] for images taking 

*Max Planck Institute for Mathematics in the Sciences, Inselstr. 22, 04103 Leipzig, Germany, ba- 
cak@mis.mpg.de 

^Department of Mathematics, Technische Universitat Kaiserslautern, Paul-Ehrlich-Str. 31, 67663 Kaiser¬ 
slautern, Germany, {bergmann, steidl}@mathematik.uni-kl.de. 

^Department of Mathematics, Technische Universitat Miinchen and Fast Algorithms for Biomedical Imag¬ 
ing Group, Helmholtz-Zentrum Miinchen, Ingolstadter Landstr. 1, 85764 Neuherberg, Germany, an- 
dreas.weinmann@tum.de. 


1 



their values in a Riemannian manifold. For scalar-valued images, the ROF functional in its 
discrete, anisotropic penalized form is given by 

5 II/-+ ^l|Vu||i, A > 0, 

where / G is a given noisy image and the symbol V is used to denote the discrete 

first order difference operator which usually contains the forward differences in vertical and 
horizontal directions. The frequently used ROF denoising model preserves important image 
structures as edges, but tends to produce staircasing: instead of reconstructing smooth areas 
as such, the reconstruction consists of constant plateaus with small jumps. An approach 
for avoiding this effect incorporates higher order differences, respectively derivatives, in a 
continuous setting. The pioneering work [15] couples the TV term with higher order terms by 
infimal convolution. Since then, various techniques with higher order differences/derivatives 
were proposed in the literature, among them [13, 17, 21, 22, 36, 43, 45, 46, 50, 61, 62, 63]. 
We further note that the second-order total generalized variation was extended for tensor 
fields in [72]. 

In various applications in image processing and computer vision the functions of interest 
take values in a Riemannian manifold. One example is diffusion tensor imaging where the data 
lives in the Riemannian manifold of positive definite matrices; see, e.g., [7, 14, 53, 65, 75, 78]. 
Other examples are color images based on non-flat color models [16, 40, 41, 73] where the 
data lives on spheres. Motion group and SO(3)-valued data play a role in tracking, robotics 
and (scene) motion analysis and were considered, e.g., in [28, 52, 55, 58, 70]. Because of 
the natural appearance of such nonlinear data spaces, processing manifold-valued data has 
gained a lot of interest in applied mathematics in recent years. As examples, we mention 
wavelet-type multiscale transforms [35, 54, 76], robust principal component pursuit on 
manifolds [37], and partial differential equations [19, 32, 69] for manifold-valued functions. 
Although statistics on Riemannian manifolds is not in the focus of this work, we want to 
mention that, in recent years, there are many papers on this topic. 

In [30, 31], the notion of total variation of functions having their values on a manifold was 
investigated based on the theory of Cartesian currents. These papers extend the previous 
work [29] where circle-valued functions were considered. The first work which applies a TV 
approach of circle-valued data for image processing tasks is [66, 67]. An algorithm for TV 
regularized minimization problems on Riemannian manifolds was proposed in [44]. There, 
the problem is reformulated as a multilabel optimization problem which is approached using 
convex relaxation techniques. Another approach to TV minimization for manifold-valued 
data which employs cyclic and parallel proximal point algorithms and does not require 
labeling and relaxation techniques was given in [77]. In a recent approach [34] the restoration 
of manifold-valued images was done using a smoothed TV model and an iteratively reweighted 
least squares technique. A method which circumvents the direct work with manifold-valued 
data by embedding the matrix manifold in the appropriate Euclidean space and applying a 
back projection to the manifold was suggested in [59]. This can be also extended to higher 
order derivatives since the derivatives (or differences) are computed in the Euclidean space. 

Our paper is the next step in a program already consisting of a considerable body of 
work of the authors: in [77], variational models using first order differences for general 
manifold-valued data were developed. In [9], variational models using first and second order 
differences for circle-valued data were introduced. Using a suitable definition of second order 
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differences on the circle the authors incorporate higher order differences into the energy 
functionals to improve the denoising results for circle-valued data. Furthermore, convergence 
for locally nearby data is shown. Our paper [11] extends this approach to product spaces of 
arbitrarily many circles and a vector space, and [10] to inpainting problems. Product spaces 
are important for example when dealing with nonlinear color spaces such as HSV. 

This paper continues our recent work considerably by generalizing the combined first and 
second order variational models to general symmetric Riemannian manifolds. Besides cyclic 
data this includes general n-spheres, hyperbolic spaces, symmetric positive definite matrices 
as well as compact Lie groups and Grassmannians. First we provide a novel definition 
of absolute second order differences for data with values in a manifold. The definition is 
geometric and particularly appealing since it avoids using the tangent bundle for its definition. 
As a result, it is computationally accessible by the machinery of Jacobi fields which, in 
particular, in symmetric spaces yields rather explicit descriptions - even in this generality. 
Employing this definition, we introduce a variational model for denoising based on first 
and second order differences in the Riemannian manifold setup. In order to minimize the 
corresponding functional, we follow [9, 11, 77] and use a cyclic proximal point algorithm 
(PPA). In contrast to the aforementioned references, in our general setup, no closed form 
expressions are available for some of the proximal mappings involved. Therefore, we use as 
approximate strategy, a subgradient descent to compute them. For this purpose, we derive 
an efficient scheme. We show the convergence of the proposed exact and inexact variant 
of the cyclic PPA in a Hadamard space. This extends a result from [4], where the exact 
cyclic PPA in Hadamard spaces was proved to converge under more restrictive assumptions. 
Note that the basic (batch) version of the PPA in Hadamard spaces was introduced in [3]. 
Another related result is due to S. Banert [6], who developed both exact and inexact PPA 
for a regularized sum of two functions on a product of Hadamard spaces. In the context of 
Hadamard manifolds, the convergence of an inexact proximal point method for multivalued 
vector fields was studied in [74]. 

In this paper we prove the convergence of the (inexact) cyclic PPA under the general 
assumptions required by our model which differs from the cited papers. 

Our convergence statements apply in particular to the manifold of symmetric positive 
definite matrices. Finally, we demonstrate the performance of our algorithm in numerical 
experiments for denoising of images with values in spheres as well as in the space of symmetric 
positive definite matrices. 

Our main application examples, namely n-spheres and manifolds of symmetric positive 
definite matrices are, with respect to the sectional curvature, two extreme instances of 
symmetric spaces. The spheres have positive constant curvature, whereas the symmetric 
positive definite matrices are non-positively curved. Their geometry is totally different, e.g., 
in the manifolds of symmetric positive dehnite matrices the triangles are slim and there are 
no cut locus which means that geodesics are always shortest paths. In n-spheres however, 
every geodesic meets a cut point and triangles are always fat, meaning that the sum of the 
interior angles is always bigger than tt. In our setup, however, it turns out that the sign of 
the sectional curvature is not important, but the important thing is the structure provided 
by symmetric spaces. 

The outline of the paper is as follows: We start by introducing our variational restoration 
model in Section 2. In Section 3 we show how the (sub) gradients of the second order 
difference operators can be computed. Interestingly, this can be done by solving appropriate 
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(a) OnM"*. (b) On S^. 


Figure 1. Illustration of the absolute second order difference on (a) the Euclidean space M" 
and (b) the sphere S^. In both cases the second order difference is the length of the line 
connecting y and c{x,z). Nevertheless on there is a second minimizer c', which is the 
mid point of the longer arc of the great circle defined by x and 2 . 

Jacobi equations. We describe the computation for general symmetric spaces. Then we 
focus on n-spheres and the space of symmetric positive definite matrices. The (sub)gradients 
are needed within our inexact cyclic PPA which is proposed in Section 4. A convergence 
analysis of the exact and inexact cyclic PPA is given for Hadamard manifolds. In Section 5 
we validate our model and illustrate the good performance of our algorithms by numerical 
examples. The appendix provides some useful formulas for the computations. Further, 
Appendix C gives a brief introduction into the concept of parallel transport on manifolds in 
order to make our results better accessible for non-experts in differential geometry. 

2. Variational Model 

Let AI be a complete n-dimensional Riemannian manifold with Riemannian met¬ 
ric (•, ■)x'- TxM. X TxM. —)• M, induced norm H-Hx) and geodesic distance dju : Ai x Ai ^ 
M>o. Let denote the Riemannian gradient of TiAI— t-M which is characterized 

for all ^ G TxAi by 

{VmF{x),Ox = DxF[C], (1) 

where DxF denotes the differential of F at x, see Appendix C. 

Let 'yx,^{t), X G Ai, ^ G TxAi be the unique geodesic starting from 7x,^(0) = x 
with 7 x, 5 ( 0 ) = C- Further let 7 ^^ denote a unit speed geodesic connecting x,z€Ai. 
Then it fulfills 7 -^ (0) = x, 7 -- (£) = z, where C = T( 7 '- ) denotes the length of the 
geodesic. We further denote by 7 ^^ a minimizing geodesic, i.e. a geodesic having 
minimal length C{j'^ ) = dj^ix, z). If it is clear from the context, we write geodesic 
instead of minimizing geodesic, but keep the notation of using 7 -- when referring 
to all geodesics including the non-minimizing ones. We use the exponential map 
expx'. TxAi ^ Ai given by exp,,, ^ = 7 a, ,c(l) and the inverse exponential map denoted 
by logj, = exp“^: A4 —)■ TxAi. The core of our restoration model are absolute second 
order differences of points lying in a manifold. In the following we define such 
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differences in a sound way. The basic idea is based on rewriting the Euclidean norm 
of componentwise second order differences in M” as \\x — 2y + z \\2 = 2\\^{x + z) — y\\ 2 -i 
see Fig. 1(a). We define the set of midpoints between x,z€A4 as 

Cx,z := {c e M : c = for any geodesic 7 -^} 

and the absolute second difference operator d 2 : —)• M>o by 


d 2 {x,y,z):= mm dM{c,y), x,y,zeM. 

c G Cx.z 


( 2 ) 


The definition is illustrated for in Fig. 1 (b). For the manifold 

definition ( 2 ) coincides, up to the factor with those of the absolute second order 
differences in [9]. Similarly we define the second order mixed differences di^i based on 
\\w - X + y - z ||2 = 2||^(rc + y) - ^{x + z)\\2 as 

di^i{w,x,y, z) := min dM{c,c), x,y,z,wGM.. 

C^Cw,y ?CGC x,z 

Let G ;= {l,...,iV} X {1,..., M}. We want to denoise manifold-valued images 
f: Q ^ M by minimizing functionals of the form 


£{u)=£{u,f) :=F(u;/) + aTVi(u) + /3TV2(u) 


(3) 


where 


a 


^ N,M 

N-l,M N,M-^ 

TVi(u) .— CTi ^ ^ d_\4(^Uij, 0:2 ^ ^ dj\/i(^Uij, , 

i,j=l i,j=l 

N-1,M 7V,M-1 

/3TV2(u) :=/3i ^ d2{ui-ij,Uij,Ui+ij) + (32 ^ d2(ujj_i,Uij, 

i=2,j=l i=lj=2 

T /ds ^ ^ dl^l(uij, Uij-f-l, Ui-f-lj, Ui-f-lj-f-l). 
ij=l 

For minimizing the functional we want to apply a cyclic PPA [4, 12]. This algo¬ 
rithm sequentially computes the proximal mappings of the summands involved in 
the functional. While the proximal mappings of the summands in data term F{u]f) 
and in the first regularization term TVi(u) are known analytically, see [27] and [77], 
respectively, the proximal mappings of d 2 : —)• M>o and dip: A4^ —k M>o are only 

known analytically in the special case A4 = see [9]. In the following section we 
deal with the computation of the proximal mapping of d 2 . The difference dip can be 
treated in a similar way. 


3. Subgradients of Second Order Differences on A4 

Since we work in a Riemannian manifold, it is necessary to impose an assumption that 
guarantees that the involved points do not take pathological (practically irrelevant) 
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constellations to make the following derivations meaningful. In particular, we assume 
in this section that there is exactly one shortest geodesic joining x,z, i.e., x is not a 
cut point of z] cf. [23]. We note that this is no severe restriction since such points 
form a set of measure zero. Moreover, we restrict our attention to the case where the 
minimizer in (2) is taken for the corresponding geodesic midpoint which we denote by 
c(x, z). 

We want to compute the proximal mapping of d 2 : —)■ M>o by a (sub)gradient 

descent algorithm. This requires the computation of the (sub)gradient of d 2 which is 
done in the following subsections. For c(x, z) ^ y the subgradient of d 2 coincides with 
its gradient 

VM3d2 = (VA^d 2 (-, y, z),VMd 2 {x, •, z), VA^d 2 (x, y, ■))^ . (4) 

If c(x, z) = y, then d 2 is not differentiable. However, we will characterize the subgra¬ 
dients in Remark 3.4. In particular, the zero vector is a subgradient which is used in 
our subgradient descent algorithm. 


3.1. Gradients of the Components of Second Order Differences 


We start with the computation of the second component of the gradient (4). In 
general we have for y): At —)■ M>o, x i—)■ y), see [68], that 


Vxii^(x,y) 


-21oga.y, VMdM{x,y) = 


log^y 

lllogj^yllx’ 


X / y. 


(5) 


Lemma 3.1. The second component ofVj^3d2 in (4) is given for y / c{x,z) by 


Vxd 2 (x, •, 2 :)(y) 


log^ c(x, z) 
\\\ogyC{x,z)\\y' 


Proof. Applying (5) for d 2 (x, •, z) = (1 ;k(c(x, z), •) we obtain the assertion. □ 

By the symmetry of c{x,z) both gradients V^d 2 (-,y, 2 ;) and V^d 2 (x,y, •) can be 
realized in the same way so that we can restrict our attention to the first one. For 
fixed z G AI we will use the notation c(x) instead of c(x,z). Let 'y^^ = 'yx,v denote 
the unit speed geodesic joining x and z, i.e., 

7a;,i;(0) = X, 7a;,t;(0) = v = and 7a;,«(r) = z, T := dM{x,z). 

We will further need the notation of parallel transport. For readers which are not 
familiar with this concept we give a brief introduction in Appendix C. We denote a 
parallel transported orthonormal frame along by 

{Ei = Ei{t),...,En = Enit).} (6) 


For t = 0 we use the special notation 

Lemma 3.2. The first component of Vji^3d2 in (4) is given for c(x, z) y by 


Vxd2(-,y,z)(x) 


n 


E 


logc(x) y 

logc(a;) y||c(a;) 


HxC['^fc] 



(7) 
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Proof. For F: —)• M defined by F := d 2 {-, y,z) we are looking for the coefficients 

ak = ak{x) in 

n 

V mF{x) = '^akfk- ( 8 ) 

k=l 

For any tangential vector rj ;= Vkf.k £ T^M. we have 

n 

{V MF{x),ri)x = DxF[ri\ = ^ rjkak. (9) 

k=l 


Since F = f o c, with /:7W—)-M, x^dM{x,y) we obtain by the chain rule 

DxF[r]] = o Dxc) [q]. 

Now the differential of c is determined by 

n 

Dxc[n] = '^qkDxc[f,k] G Tc(x)M. 
k=l 


Then it follows by (1) and (5) that 


DxF[q] = Dc^^)f[Dxc[q]] = {VMf{c{x)),Dxc[q])c{x) 
\ II logc(x) 2/llc(x) ^ 


and consequently 


{VmF{x),v)x = '^Vk 


logc(x) y 


\ II ^®gc(x) 2/llc(x) 

By (9) and (8) we obtain the assertion (7). 


) DxC\_^k\ 


c{x) 


□ 


For the computation of Dxc[f^k] we can exploit Jacobi fields which are dehned as 
follows: for s G (—e,e), let ax^^,^{s), k = denote the unit speed geodesic 

with (0) = X and Let Ck{s) denote the tangential vector in ax,^,.{s) 

of the geodesic joining ax,^,^{s) and z. Then 


Fkis^t) := exp^^_^j^) {tCk{s )), s G (-e,e),t G [0,r], 
with small e > 0 are variations of the geodesic 'jx^v and 

d 

Jk{t) ■=-^Tk{s,t)\^^^, k = l,...,n, 

are the corresponding Jacobi field along ^x,v For an illustration see Fig. 2. Since 
Ffc(s,0) = and Ffc(s,r) = z for all s G (—£,£) we have 


-^fc(O) = Cfc, Jk{T) = 0, A: = 1,..., n. 


( 10 ) 
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= (cocr^,cJ(s) 


— 0'a;,gfc('S) 


Figure 2. Illustration of the variations Ti.{s,t) of the geodesic 'y^^v to define the Jacobi field 
Jkit) = along 72 ;,^ with respect to ^k- 


Since Tk{s,^) = {c o ax,^^.){s) we conclude by definition (37) of the differential 

Dxc[^k] = ;^(cofTx,4)(s)L=o = = Jk{^), k = l,...,n. 

Any Jacobi field J of a variation through 'jx^v fulfills a linear system of ordinary 
differential equations (ODE) [42, Theorem 10.2] 

where R denotes the Riemannian curvature tensor defined by ^ R{^X)'n ■= 

Here [•,•] is the Lie bracket, see Appendix C. Our special 
Jacobi fields have to meet the boundary conditions (10). We summarize: 

Lemma 3.3. The vectors Dxc[^k], k = 1,... ,n, in (7) are given by 

Jk ("j)’ ^ l,...,n, 

where Jk are the Jacobi fields given by 

T Ri.Jkijx,v)jx,v — 0 ) <7^(0) — ^ki JkiT) — 0 . 

Finally, we give a representation of the subgradients of d 2 in the case c(x, z) = y. 
Remark 3.4. Let (x, y, z) G with c(x, z) = y and 

V := := (6,?y,6) : G TxM,Cy = e T,M} , 

where is the Jacobi field along the unit speed geodesic 7 ^^ determined by J(0) = 

and JiT) = ^z- Then the subdifferential dd 2 at {x,y,z) G reads 

dd 2 {x, y,z) = ^ar] : y G G [ly,Urf\^ , (12) 
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where V"*- denotes the set of normalized vectors r] ;= {rix,Vy:Vz) £ TxM x TyAi x T^Ai 
fulfilling 


{f,xi Vx)x + Vy)y A {f,zi Vz)z — 0 

for all {f,x,f,y,f,z) £ V, and the interval endpoints ly, Uy are given by 

d2(exp^(rr/3,),exp (T7?j^),exp^(rr?^)) 

U = lim- - -, 

rto T 


(13) 


. d2(exp^ {rrjx), exp ), exp^(rr?^)) 

Ur, = lim- - - 

r^O r 


This can be seen as follows: For arbitrary tangent vectors f,z sitting in x, z, respec¬ 
tively, we consider the uniquely determined geodesic variation T{s,t) given by the side 
conditions r( 0 ,t) = r(s,0) = exp^ {sfx) cls well as r{s,T) = exp^, (s^z). We note 

that c{T{s,0),T{s,T)) = T{s,T/2) which implies for s £ {—e,£) that 


d2(r(s,o),r(s,r/2),r(s,r)) = o. 


(14) 


In view of the definition of a subgradient, see [26, 33], it is required, for a candidate p, that 


d 2 {eWx{hx),ex.-py{hy),e:^Y>z{hz)) - d 2 {x,y,z) > {h,fi) + o{h). (15) 

for any sufficiently small h ;= {hx,hy,hz)- Setting h ;= (^x, (T"/2), ^z), equation (14) 

tells us that the left hand side above equals 0 up to o{h), and thus, for a candidate g, 

i^x, 3x)x + 3y)y + iCz, 'nz)z = o{h) 

for all^xfiy of small magnitude. Since these are actually linear equations for w = {r]x,'>ly,Vz) 
in the tangent space, we get 


{f.x, Vx)x + ( 2 )’ 3y)y + (Cz) Vzlz — 0. 

Then, if g ^ V"*", the nominators in (13) are nonzero and the limits exist. Finally, we 
conclude (12) from (15). 

In the following subsection we recall how Jacobi fields can be computed for general 
symmetric spaces and have a look at two special examples, namely n-spheres and 
manifolds of symmetric positive dehnite matrices. 

3.2. Jacobi Equation for Symmetric Spaces 

Due to their rich structure symmetric spaces have been the object of differential 
geometric studies for a long time, and we refer to [24, 25] or the books [8, 18] for 
more information. 

A Riemannian manifold At is called locally symmetric if the geodesic reflection Sx 
at each point x G At given by mapping 7(t) i—)■ 7 (—t) for all geodesics 7 through 
X = 7(0) is a local isometry, i.e., an isometry at least locally near x. If this property 
holds globally, At it is called a (Riemannian globally) .symmetric space. More formally. 
At is a symmetric space if for any x G At and all f G TxS there is an isometry Sx 
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on A4 such that Sa;(x) = x and DxSx[^ = —C- A Riemannian manifold Ai is locally 
symmetric if and only if there exists a symmetric space which is locally isometric to 
As a consequence of the Cartan-Ambrose-Hicks theorem [18, Theorem 1.36], every 
simply connected, complete, locally symmetric space is symmetric. Symmetric spaces 
are precisely the homogeneous spaces with a symmetry Sx at some point x G Af. 
Beyond n-spheres and the manifold of symmetric positive definite matrices, hyperbolic 
spaces, Grassmannians as well as compact Lie groups are examples of symmetric spaces. 
The crucial property we need is that a Riemannian manifolds is locally symmetric 
if and only if the covariant derivative of the Riemannian curvature tensor R along 
curves is zero, i.e., 

VR = 0. (16) 

Proposition 3.5. Let M. be a symmetric space. Let 7 ; [0,T] —)■ Af he a unit speed geodesic 
and {01 = 0i(t),..., 0n = a parallel transported orthonormal frame along 7 . Let 

J{t) = ® Jacobi field of a variation through 7 . Set a ;= (oi.... ,anj^■ 

Then the following relations hold true: 

i) The Jacobi equation (11) can he written as 

a” {t) + G a{t) = (17) 

with the constant coefficient matrix G := ((R(0i, 7 ) 7 , 0 j) 7 )”j=i- 

ii) Let { 6 * 1 ,..., 6 *„} be chosen as the initial orthonormal basis which diagonalizes the 
operator 

e^R( 0 ,j)j (18) 

at t = 0 with corresponding eigenvalues Ki, i = 1,... ,n, and let {Qi,..., 0 ^} be the 
corresponding parallel transported frame along 7 . Then the matrix G becomes diagonal 
and (17) decomposes into the n ordinary linear differential equations 

af{t) + Kiafit) = 0 , i = 1 ,..., n. 


Hi) The Jacobi fields 


Jk{f) := 


sinh(^-Kfci) 0fc(i), 
< sin(^/K^t) 0fc(t), 
t 0 fc(t), 


if Kk < 0 , 
if Kk > 0 , 
if Kk = 0 , 


k = 1,... ,n form a basis of the n dimensional linear space of Jacobi fields of a variation 
through 7 fulfilling the initial condition J(0) = 0. 


Part i) of Proposition 3.5 is also stated as Property A in Rauch’s paper [56] in 
the particularly nice form “The curvature of a 2-section propagated parallel along a 
geodesic is constant.” For convenience we add the proof. 
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Proof, i) Using the frame representation of J, (38) and the linearity of R in the first 
argument, the Jacobi equation (11) becomes 

n 

0 = ^ al{f)Qi{f) + ai(t)i?( 0 i, 7 ) 7 , 
i=l 

and by taking inner products with Qj further 

0 = a"(t) + (i?( 0 i, 7 ) 7 , 0 j)ai(t), j = 1 ,... ,n. 

Now (16) implies for -R(0i, 7)7 = Ufc(t)0fc(i) 

n 

o = v^i? = j; 4 (t) 0 fc(t) 

k=l 

which is, by the linear independence of the 0^, A: = l,...,n, only possible if all Vik 
are constants and we get G = 

Parts ii) and iii) follow directly from i). For hi) we also refer to [18, p. 77]. □ 


With respect to our special Jacobi fields in Lemma 3.3 we obtain the following 
corollary. 


Corollary 3.6. The Jacobi fields Jk, k = 1,... ,n of a variation through with 

boundary conditions Jk{Q) = £,k o,nd Jk{T) = 0 fulfill 


Jk{^) 





sinlTf^TT) 

1 - 

2 7 


if Kk < 0 , 

if Kk > 0 , 
if Kk = 0 . 


(19) 


Proof. The Jacobi fields Jk{t) := ctkJkiT — t) of a variation trough 7 ^-^ '■='^x,v{T — t) 
satisfy Jfc(O) = 0 and by Proposition 3.5 iii) they are given by 


Jk{t) ■■= 


sinh(^-Kfct) Ek{T-t), 
< sinip/kfit) EkiT-t), 
fi Ek{T-t), 


In particular we have 


if Kk < 0 , 
if Kk > 0 , 
if Kk = 0 . 


Jk{T) := 


sinh(V-KfcT) Ck, 
< sm{^/KfT) fk, 

T Ck, 


Now OfeCfc = OikJk{0) = Jk{T) determines as 


if Kk < 0 , 
if Kk > 0 , 
if Kk = 0 . 


afc 


sinh(^-Kfcr), 

< sin(^/7f^T), 

T, 


if Kk < 0 , 
if Kk > 0 , 
if Kk = 0 
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and Jk (t) = ^Jk(T — t). We notice that the denominators of the appearing fractions 
are nonzero since x and z were assumed to be non-conjugate points. Finally, we 
get (19). □ 

Let us apply our findings for the n-sphere and the manifold of symmetric positive 
definite matrices. 

The Sphere We consider the n-sphere S”. Then, the Riemannian metric is 
just the Euclidean distance {■,-)x = {■,■) iii For the definitions of the geodesic 

distance, the exponential map and parallel transport see Appendix A. Let 7 := 'yx,v 
We choose •= ^ = 7(0) and complete this to an orthonormal basis 
of TrS" with corresponding parallel frame {Si,...,S„} along 7. Then diagonalizing 
the operator (18) is especially simple. Since S” has constant curvature (7 = 1, the 
Riemannian curvature tensor fulfills [42, Lemma 8.10] R(0, S)T = (S, T)© — (0, T)S. 
Consequently, 

R(0,7)7 = (7,7)0 - (0,7)7 = 0 - (0,7)7 

so that at t = 0 the vector ,^1 is an eigenvector with eigenvalue = 0 and ^i, 
i = 2,...,n, are eigenvectors with eigenvalues Ki = 1. Consequently, we obtain by 
Lemma 3.3 and Corollary 3.6 the following corollary. 

Corollary 3.7. For the sphere S” and the above ehoice of the orthonormal frame system, 
the following relations hold true: 

• T 

1 Sin ~ 

-C>xc[6] = 2 ^ 1 ( 1 -)’ = ^^Sfe(f), k = 2,...,n. 

Symmetric Positive Definite Matrices. Let Sym(r) denote the space of symmetric 
r X r matrices with (Frobenius) inner product and norm 

( 20 ) 

Let 'P(r) be the manifold of symmetric positive definite r x r matrices. It has the 
dimension dim'P(r) = n = tangent space of V{r) at x G V{r) is given by 

TxF{r) = {x} X Sym(r) = {x^gx^ : rj G Sym(r)}, in particular TjV{r) = Sym(r), where 
I denotes the r x r identity matrix. The Riemannian metric on TxV reads 

■= tT{riix~^r]2X~^) = {x~2r]ix~2 ,x~'2r]2X~'2), 

where (•, •) denotes the matrix inner product (20). For the definitions of the geodesic 
distance, exponential map, parallel transport see the Appendix B. 

Let 7 := 'yx,v and let the matrix v G TxV have the eigenvalues Ai,...,Ar with a 
corresponding orthonormal basis of eigenvectors vi,...,Vr in W, i.e., 

r 

V = y^^XiVjvj. 

i=l 


(A,R):= Pll := ( E 

hi=i \hi=] 
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We will use a more appropriate index system for the frame ( 6 ), namely 




Then the matrices 


^ij ■— 


^(Vivj + VjvJ), {i,j) el iii = j, 
:j^{vivj + VjvJ), {i,j) eX if i 7^ j, 


( 21 ) 


form an orthonormal basis of TxV{r). 

In other words, we will deal with the parallel transported frame Sjj, {i,j) £ I, 
of ( 21 ) instead of A: = l,...,n. To diagonalize the operator (18) at t = 0 we use 
that the Riemannian curvature tensor for V{r) has the form 


i?(0, H)T = — [x 20X 2 ^x 2 Hx 2 ]^x 2 Tx 2 
with the Lie bracket \A, B] = AB — BA of matrices. Then 

\ 1 i fr -- -- —-- —i" 

R(0,7)7 = — 1x2 [x 2 0X 2 ^x 27X ‘ 2 \,x 27X 2 


1 

X 2 


1 

X 2 


and for t = 0 with 0 = 0 ( 0 ) the right-hand side becomes 


ne) 


1 

4 

1 

4 




2 0X 2^x 2X!X 2 ]^ 

— 2bwb + b‘^w)x 2 ^ 


_ 1 _ 1 

X 2 vx 2 


1 

X2 


where b := x~ 2 vx ~2 and w := x~20x~2. Expanding 6 = X)(j 

orthonormal basis of TxV and substituting this into T{6) gives after a straightforward 

computation 

T{e)= ^ 

Ri) 6 X 

Thus {^ij : {i,j) G X} is an orthonormal basis of eigenvectors of T with corresponding 
eigenvalues 

Kij = ~j(Aj ~ Xj) , ihj) ^ 

Let Xi := G X : Aj = Aj} and X 2 := G X : A* 7 ^ Xj}. Then, by Lemma 3.3 

and Corollary 3.6, we get the following corollary. 


Corollary 3.8. For the manifold of symmetric positive definite matrices V{r) it holds 


XixC[f,ij] 


2 V 2 / 

sinh(f |Ai-Aj|) jT 
^ sinh(|^|Ai-Aj-|) ^*1'2 


if (i,j) G Xi, 
if (i,j) G X 2 . 
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4. Inexact Cyclic Proximal Point Algorithm 

In order to minimize the functional in (3), we follow the approach in [9] and employ 
a cyclic proximal point algorithm (cyclic PPA). 

For a proper, closed, convex function (/>: —)■ (—oo,+oo] and A>0 the proximal 

mapping prox_)^ 0 : —)■ M™' at x G is defined by 

prox;,^(x) := arg min 17 ^ ||x - y ||2 + (piv) \, 

see [49]. The above minimizer exits and is uniquely determined. Many algorithms which 
were recently used in variational image processing reduce to the iterative computation 
of values of proximal mappings. An overview of applications of proximal mappings is 
given in [51]. 

Proximal mappings were generalized for functions on Riemannian manifolds in [27], 
replacing the squared Euclidean norm by the squared geodesic distances. For (p: AI™' —)• 
(— 00 ,+ 00 ] and A>0 let 

. / 1 

:= arg mm < — 

I 2 A 

For proper, closed, convex functions (p on Hadamard manifolds the minimizer exits 
and is uniquely determined. More generally, one can define proximal mappings in 
certain metric spaces. In particular, such a definition was given independently in [38] 
and [47] for Hadamard spaces, which was later on used for the PPA [3] and cyclic 
PPA [4]. 



+ (p{y)y (22) 

j=i ^ 


4.1. Algorithm 

We split the functional in (3) into the summands 

15 

e = T,^i, 

1=1 


where Si{u) := F{u; f) and 


1 

Q^TVi(u) — ^ ^ Oil ^ ^ dj\/f(^U2i—l-l-ui,j j 
ui=o 7i=i 

1 

+ ^02 ^ dM{Ui,2j-l+U2,Ui^2j+U2) 

U2=0 i,j=i 
1 1 
= : S2+,2iiu) + Y ‘^ 4 + 1/2 

1^1=0 U2=0 


(23) 
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and 


/3TV2 (u) 

2 LtJ." 

~ ^ ^ / 3 l ^ ^ d 2 (M 3 j_ 2 +i/j j, "USi—j,'U 3 j-|_j/j^) 

ui=0 i,j=i 
2 

+ ^ 132 ^ d2( '^i,3j—2+U2 1 '^^i,3j—l+U2 1 ^i,3j+i'2 ) 

1/2=0 i,j=i 

I JV-l I I M-l I 

1 L^J ’ 

“1“ ^ ^ /^S ^ ^ di^l (U2j—l+jy3^2j —1+!2'4 ) ^^2i+!^3,2j —1+1/4) "*^21—1+!^3,2 j+i/ 4 ) ^2i+i/3,2j+i24) 

1/3,1/4=0 +i=l 

2 2 1 

^6+1/1 ('W') + ^ ^ ^9+t/i ('^■) + ^ ^ ‘?12 +i/ 3+2 j/ 4('U')- 

1/1=0 1/1=0 1/3,1/4=0 

Then the exact cyclic PPA computes starting with = / until a convergence 
criterion is reached the values 

:= o 0...0 prox^^^^ ) (24) 

where the parameters > 0 in the k-th cycle have to fulfill 

00 00 

^^Afc = oo, and ^^A|<oo. (25) 

k=0 k=0 

By construction, the functional Si, Zg 15} in (24), contains every entry of u 

at most once. Hence the involved proximal mappings of prox;^^-^ consists of can be 
evaluated by computing all involved proximal mappings, one for every summand, in 
parallel, i.e. for 

(DO) , fij) of the data fidelity term, 

(Dl) aidMiu2i-i+ui,j,U2i-ui,j), ci2dMiui,2j-i+u2,Ui,2j-u2) of the first order differences, 

(D2) / 3 i(Z 2 (^ 3 i—2+1/1 j ) )) /^ 2 d 2 (lli, 3 j—2+1/2) ^i, 3 j — l+!/2) ^*,3j+i/2 ) SeCOnd 

order differences, and /33di,i(u2i-i+i/3,2i-i+i/4> ^21+1/3,2i-i+i/4i 
tt2i-i+!/3,2j+i/4,^i2i+i/3,2j+!/4) of the second order mixed differences. 

Taking these as the functions cj) which are of interest in (22) we can reduce our 
attention to m = 1,2 and m = 3,4, respectively. Analytical expressions for the 
minimizers defining the proximal mappings, for the data fidelity terms (DO) are given 
in [27], and for the first order differences (Dl) in [77]. For the second order difference 
in (D2) such expressions are only available for the manifoldA1 = S^, see [9]. 

In order to derive an approximate solution of 

prox;,d2(5'i,5'2,53) = argmini \ '^dM{xj,gjf + Ad2(xi,X2,X3) I =: argminV’(a;) 

xGM^ I ^ j—l J xGM^ 
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(a) An inexact prox;^^^, A = 27r. (b) An inexact prox;^(j2, A = 47r. 

Figure 3. Illustration of the inexact proximal mapping {x',y',z') = prox^i^^j^ (a;, y, z). For all 
points the negative gradients are shown in dark blue. 


Algorithm 1 Subgradient Method for prox;^^^ 

Input data g = (yi,52,ffs) £ , a sequence r = {Tk}k G 

function SuBGRADiENTPROxD2(y, r) 

Initialize =5, x* = k = 1. 

repeat 

^ exp^(fc-i) 

if 'tp{x*]g) > then x* •(— x^^^ 

k k + 1 

until a convergence criterion is reached 

return x* 


we employ the (sub)gradient descent method to V'- For gradient descent methods on 
manifolds including convergence results we refer to [1, 71]. The subgradient method is 
one of the classical algorithms for nondifferentiable optimization which was extended 
for manifolds, e.g., in [26, 33]. In [26] convergence results for Hadamard manifolds 
were established. A subgradient method on manifolds is given in Algorithm 1. In 
particular, again restricting to the second order differences, we have to compute the 
gradient of ip: 


5i\ 

Vm^iP{x) = - logj,2 52 1 + AV_A43d2(xi, X 2 , X 2 ), c(xi, X 3 ) ^ X 2 . 

\log^3 53/ 

The computation of V_^3d2 was the topic of Section 3. A result of Algorithm 1 for 
the points already used in Fig. 1 (b), the Fig. 3 illustrates the proximal mapping for 
two different values of A. 

In summary this means that we perform an inexact cyclic FFA as in Algorithm 2. 
We will prove the convergence of such an algorithm in the following subsection for 
Hadamard spaces. 
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Algorithm 2 Inexact Cyclic PPA for minimizing (3) 

Input data / G a G IP>0) P £ ^>0’ ^ sequence A = {Afcjfc, Xk > 0, 

fulfilling (25), and a sequence of positive reals e = {ek}k with ^ < oo 

function CPPA(a, (3, A, /) 

Initialize = /, k = 0 

Initialize the cycle length as L = 15 (or L = 6 for the case M = 1 ). 

repeat 

for / ^ 1 to L do 

where the proximal operators are given analytically for (DO) 
and (Dl) as in [27, 77] and approximately for (D2) via Algorithm 1 
and the error is bounded by 
k ^ k + 1 

until a convergence criterion is reached 

return 


4.2. Convergence Analysis 

We now present the convergence analysis of the above algorithms in the setting of 
Hadamard spaces, which include, for instance, the manifold of symmetric positive 
definite matrices. Recall that a complete metric space {X, d) is called Hadamard if 
every two points x, y are connected by a geodesic and the following condition holds 
true 

d{x, v)‘^ + d{y, w)"^ < d{x, w)'^ + d{y, u)^ + 2d{x, y)d{v, w), (26) 

for any x, y,v,w G X. Inequality (26) implies that Hadamard spaces have nonpositive 
curvature [2, 57] and Hadamard spaces are thus a natural generalization of complete 
simply connected Riemannian manifolds of nonpositive sectional curvature. For more 
details, the reader is referred to [5, 39]. 

In this subsection, let {'H,d) be a locally compact Hadamard space. We consider 

L 

= (27) 

1=1 

where : 7^ —)■ M are convex continuous functions and assume that ip attains a (global) 

minimum. 

For Hadamard spaces T-L := Xt^, N = A • M, the functional ip = S in (23) fits 
into this setting with L = 15. Alternatively we may take the single differences in 
(D0)-(D2) as summands (pi. Our aim is to show the convergence of the (inexact) cyclic 
PPA. To this end, recall that, given a metric space {X,d), a mapping T: X ^ X is 
nonexpansive if d{Tx,Ty) <d{x,y). In the proof of Theorem 4.3, we shall need the 
following well known lemmas. Lemma 4.1 is a consequence of the strong convexity of 
a regularized convex function and expresses how much the function’s value decreases 
after applying a single PPA step. Lemma 4.2 is a refinement of the fact that a 
bounded monotone sequence has a limit. 
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Lemma 4.1 ([5, Lemma 2.2.23]). (—oo,+oo] is a convex lower semi-continuous 

function, then, for every x,y gT-L, we have 

h (pmxxhix)) - h{y) < ^d{x, yf - {Y>^o^\h{x),yf • 

Lemma 4.2. Let {ak}k£N, {bk}k£N, {cfcjfceN cind {r]k}k£N be sequences of nonnegative real 
numbers. For eaeh k G N assume 

®fc+l ^ (1 “1“ Vk) Oik bk Ck, 

along with 

OO OO 

Ck < OO and r]k < oo. 
k=l k=l 

Then the sequenee {afcjfceN converges and bk < oo. 

Let us start with the exact cyclic PPA. The following theorem generalizes [4, 
Theorem 3.4] in a way that is required for proving convergence for our setting. The 
point p in Theorem 4.3 is a reference point chosen arbitrarily. In linear spaces it is 
natural to take the origin. Condition (28) then determines how fast the functions ipi 
can change their values across the space. 

Theorem 4.3 (Cyclic PPA). Let ifH.,d) he a locally compact Hadamard space and let ip 
in (27) have a global minimizer. Assume that there exist p G LI and C > 0 such that for 
each I = 1,..., L and all x,y G LI we have 

(pi{x) - ipi{y) < Cd{x, y) (1 + d{x,p)). (28) 

Then the sequence defined by the eyelic PPA 

:= o prox;,^^^_^ o ... o prox;,^,^^ (x^^^) 

with {AfcjfceN as in (25) converges for every starting point x^°^ to a minimizer of p. 

Proof. For I = 1,... L we set 

(fc+r) . / 

1. First we prove that for any fixed q G LL and all k G Nq there exists a constant 
Cq > 0 such that 

d{x'^^^^\qf < {1 + CqXl)d{x^’‘\q)‘^ - 2Xk - ip{q)^ + CqXl. (29) 

For any fixed q G LL we obtain by (28) and the triangle inequality 

ipi{x) - (pi{y) < Cqd{x,y){l +d{x,q)), Cq := I + d{q,p). (30) 

Applying Lemma 4.1 with h := pi, x := and y ■= q we conclude 

d{x^^+h,qf <d{x^^+^-^\qf-2Xk - Ti{q)^ 
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for I = 1,... ,L. Summation yields 


1=1 

L ^ 

= d{x‘^^\qY - 2Afc((/9(x(^)) - (/?(g)) + 2\k'^{^i{x^^'>) - (/9z(x^^+i))^,(31) 


where we used (27). The growth condition in (30) gives 


(pi{x^^^^ — ipi{x^^^< Cqd{x^^\x ^^^^1 + . 

By the definition of the proximal mapping we have 

2y^k 

and by (30) further 




1 \ / 7 .. 1 ^ 


d(x(^+ \x^^^l')) 

<2XkCq{l + d{x^’^+'-T^\q)y 


for every I = 1,... ,L. For I = 1 this becomes 


d{x^’"\ L^) < 2XkCq (l + d{x^’"\ q)'j , (34) 

and for I = 2 using (34) and the triangle inequality 

d{x^'^^h,x^’^+h) < 2XkCq(l + d{x^'^^h,q)'^ 

< 2XkCq (1 + 2AfcC,) (l + d{x^’^\q)). 

By (25) we can assume that Xk < 1. Then replacing 2Cq (1 + 2Cq) by a new constant 
which we call Cq again, we get 

d(x(^+F),x(^+F)) < XkCq (l + d{x^^\q)^ . 

This argument can be applied recursively for I = 3,..., L. In the rest of the proof we 
will use Cq as a generic constant independent of A^. Using 

d(x^'^\x^^^iy < d(x^^\x^^^h^ + ••• + d(x('^+^\x(^+r)) 


we obtain 


d{x^^\ x^^~^L^) < XkCq (l + d{x^’'\ q) 
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for l = Consequently we get by (32) that 

< XkCg (^1 + d{x^^\ , 

for l = l,...,L. Plugging this inequality into (31) yields 

q)"^ < dl^x^^^q'j - 2Xk(<f{x^’‘^) - ^p{q)J + Cq (l + d{x‘'’‘\ q)‘^'j 
which hnishes the proof of (29). 

2 . Assume now that q £ T-L is a minimizer of ip and apply Lemma 4.2 with ak ■= 
d[x^^\q) , bk ■= 2Xk {ip{x^^^) — ipiq)), Ck '■= CqX\ and rjk '■= CqX\ to conclude that the 
sequence {d[x^^\q)}k£no converges and 

CXD 

< oo- (35) 

k=0 

In particular, the sequence is bounded. From (35) and (25) we immediately 

obtain miny? = liminffc_,.oo (^(x^^i), and thus there exists a cluster point z ^ T-L of 
{xi^ijfcgN which is a minimizer of <p. Now convergence of {d[x^^\ z)}k£no icnplies 

that{x('')}fcgN converges to z as A: —)■ oo. (By (33) we see moreover that {x*'^'''i^}fcgis}, 
I = 1,..., L converges to the same point.) □ 

Next we consider the inexact cyclic PPA which iteratively generates the points 
x(^+l)^ / = 1,..., L, a G No, fulfilling 

d(x(^+L\prox;,^^j(x(^+TF))) < (36) 

where {efcjfceNo is a given sequence of positive reals. 

Theorem 4.4 (Inexact Cyclic PPA). Let {'H,d) be a loeally compaet Hadamard space and 
let if be given by (27). Assume that for every starting point, the sequence generated by the 
exact cyclic PPA converges to a minimizer of ip. Let {xi^ijfcgpj he the sequence generated by 
the inexact cyclic PPA in (36), where < co. Then the sequence {xi^ij^gN converges 

to a minimizer of ip. 

We note that the assumptions for Theorem 4.4 are fulfilled if the assumptions of 
Theorem 4.3 are given. 

Proof. For k,m gNq, set 

_ j q.{k) if k < m, 

ym,k ■- I Tk-i{x^^-^^) if k>m, 

where 

Tk := prox^,,^^ o ... o prox^^^^ . 

Hence, for a fixed m G No, the sequence {ym,k}k is obtained by inexact computations 
until the m-th step and by exact computations from the step (m + I) on. In particular. 
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the sequence {yo^k}k is the exact cyclic PPA sequence and {yk,k}k the inexact cyclic 
PPA sequence. By assumption we know that, for a given m G No, the sequence 
{ym,k}k converges to minimizer ym of ip. Next we observe that the fact < oo 

implies that the set ^ k,mGNo} is bounded: Indeed, by our assumptions the 

sequence {yo,k}k onverges and therefore lies in a bounded set Dq C Ti. By (36) and 
since the proximal mapping is nonexpansive, see [5, Theorem 2.2.22], we obtain 

d(x(i),prox;^„^j(x®)) < 

+ d (^prox;^^^2(x(i)),prox;,^,^2 (prox^^ov^i(^^°^))) 

< ^+ (i(x(i),prox;,^^^(xW)) < ^ 

and using the argument recursively 

d{x^^\Toix^^^)) < £0- 

Hence {yi^k}k lies in a bounded set Di := {x & Td: d{x,Do) < eo}- The same argument 
yields that the sequence {ym,k}k with m > 1 lies in a bounded set 


Drr, ■■= 



Finally the set {ym,k- k,m G No} is contained in {x gT-L: d{x,Do) < 

Consequently, also the sequence {ym}m is bounded and has at least one cluster point 
z which is also a minimizer of ip. Since the proximal mappings are nonexpansive, we 
have d{ym,ym+i) < ^m- Using again the fact that X^^o^i <00, we obtain that the 
sequence {ym}m cannot have two different cluster points and therefore limm^.oo ym = -Z- 
Next we will show that the sequence {x^^^}k converges to z as A: —)• 00. To this 
end, choose (5 > 0 and find mi G N such that < 5 d{ymi,z) < |. Next 

find m2 > mi such that whenever m > m2 we have 


d{ymi,mi ymi) < g- 

Since the proximal mappings is nonexpansive, we get 


d{ymi,miym,m) ^ 


m—1 


00 

< V 


Finally, the triangle inequality gives 

3 3 3 

d{z, ym,m) < d{z, ymi ) T d[ymi ■, ymi,m) + d{ymi,mi ym 

and the proof is complete. (For each I = the sequence has the 

same limit as □ 
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Remark 4.5. Note that the eondition < oo is necessary. Indeed, let C := {(x,0) G 

: X G M} and let tp := d{-,C). Then one can easily see that an inexact PPA sequence with 
errors £k satisfying Sk = oo does not converge. 

Remark 4.6. While the theory of convergence for the inexact proximal point algorithm, 
especially the convergence Theorem 4-4 valid, we noticed that some functions in our 
splitting from Section 4-1 do not fulfill the assumptions of the theorem. For the convergence 
of Algorithm 2 claimed in Corollary 4-6 of the former arXiv version, all involved functions 
ipi have to be geodesically convex. Unfortunately the second order differences d2 are not 
jointly convex in their three arguments as the following discussion shows. Let M. be a finite 
dimensional Hadamard manifold. 


i) Let x{t) := ^z-rz 2 ^^'^ geodesics connecting xi,X 2 and zi,Z 2 

respectively, and c{t) := 'yx{t),z{t){^) the midpoint function. In particular we have 

c(0) = general this midpoint function does not 

coincide with the geodesic 7c := 7 ■ Take for example the Poincare disc Ai :=D 

c(0),c(l) 

and 


xi := 

zi := 


sin I tank 1 
cos ^ tank 1 

sin I tank ^ 
cos tank ^ 


X2 := 


5^2 := 


— sin ^ tank 1 
cos ^ tank 1 

— sin ^ tank ^ 
cos ^ tank ^ 


The computed the mid point curve c and the geodesic 7c are depiced Figure 4 - 


ii) The second order difference f{x,y,z) := d 2 {x,y,z) is in general not (jointly) convex. 
To this end, we use the example in i) and consider the second order differenee function 
along x{t),y{t) := 7c(t) and z{t), t G [0,1]. Since the mid point curve is not the 
geodesic, there exists a point tg £ [0,1] sueh that 


dx(c(to),7c(io)) > 0. 


Then we get 

f{x{to),y{to),z{to)) = dA^(c(tg),y(tg)) = dA^(c(tg),7c(to)) > 0 


and 

(1 - tg)/(x(0), 1/(0), 2;(0)) +to/(x(l), 2/(1), 2:(1)) = (1 - tg)dA^(c(x(0), 2:(0)), 7c(0)) 

+ tgdA^(c(x(l),2;(l)),7c(l)) = 0 


so that f is not eonvex. 

Remark 4.7 (Random PPA). Instead of considering the cyclic PPA in Theorem 4-4 j one 
can study an inexact version of the random PPA, generalizing henee [4, Theorem 3.7]. This 
would rely on the supermartingale convergence theorem and yield the almost sure convergenee 
of the inexact PPA sequence. We however choose to focus on the cyclic variant and develop 
its inexact version, because it is appropriate for our applications. 
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- x{t) 

- zit) 

- c(t) 

lc{t) 


Figure 4. The geodesics x(t), z{t) (violet) on D, its mid point curve c (cyan) and the geodesic 7c 
(light green) from the above example i) from Remark 4.6. The curves c and 7c do not coincide. 


5. Numerical Examples 

Algorithm 2 was implemented in Matlab and C++ with the Eigen library* for both 
the sphere and the manifold of symmetric positive definite matrices ^(3) employing 
the subgradient method from Algorithm 1. In the latter algorithm we choose 0 from 
the subdifferential whenever it is multi-valued. Furthermore a suitable choice for the 
sequences in Algorithms 2 and 1 is A := Aq = f, and = A^, 

respectively. The parameters in our model (3) were chosen as a := ai = a 2 and 
/3 := /3i = /32 = fy with an example depending grid search for an optimal choice. The 
experiments were conducted on a MacBook Pro running Mac OS X 10.10.3, Core i5, 
2.6 GHz with 8 GB RAM using Matlab 2015a, Eigen 3.2.4 and the clang-602.0.49 
compiler. For all experiments we set the convergence criterion to 1000 iterations 
for one-dimensional signals and to 400 iterations for images. This yields the same 
number of proximal mapping applied to each point, because we have L = 15 for the 
two-dimensional case and L = 6 in one dimension. To measure quality, we look at the 
mean error ^ 

ieg 

for two signals or images of manifold-valued data {aJijieg, {yijieg defined on an index 
set Q. 

5.1. valued Data 

Sphere-Valued Signal. As first example we take a curve on the sphere S^. For 
any o > 0 the lemniscate of Bernoulli is defined as 

•= ■ 2 ^, 1 (cos(t), cos(t) sin(t))'^, t G [0, 27r]. 
sm (t) + 1 


‘available at http://eigen.tuxfamily.org 
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(c) Reconstruction with TV2, (d) Reconstruction with TVi & TV2, 

a = 0, /3 = 10, = 3.66 x IQ-^. a = 0.16, /3 = 12.4, E = 3.27 x 


Figure 5. Denoising an obstructed lemniscate of Bernoulli on the sphere S^. Combining first 
and second order differences yields the minimal value with respect to E(fo,Ur). 
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To obtain a curve on the sphere, we take an arbitrary point p G and dehne the 
spherical lemniscate curve by 

75 (t) = logp(7(t)) 

Setting a= both extremal points of the lemniscate are antipodal, cf. the dotted 
gray line in Fig. 5(a). We sample the spherical lemniscate curve for p= (0,0,1)"’" 
at ti := Iyj, i = 0, ...,511, to obtain a signal (/o,i)^^Q G (S^)^^^. Note that the 
hrst and last point are identical. We colored them in red in Fig. 5(a), where the 
curve starts counterclockwise, i.e., to the right. This signal is affected by an additive 
Gaussian noise by setting /, := exp, rji with rj having standard deviation of a = 

independently in both components. We obtain, e.g., the blue signal / = (/j)^^Q in 
Fig. 5(a). We compare the TV regularization which was presented in [77] with our 
approach by measuring the mean error E{fo,Ur) of the result G (S^)^’^^ to the 
original data /o, which is always shown in gray. 

The TV regularized result shown in Fig. 5 (b) suffers from the well known staircasing 
effect, i.e., the signal is piecewise constant which yields groups of points having the 
same value and the signal to look sparser. The parameter was optimized with respect 
to FI by a parameter search on for a and for /3. When just using second 

order differences, i.e. setting a = 0, we obtain a better value for the quality measure, 
namely for /? = 10 we obtain FI = 3.66 x 10“^, see Fig. 5(c). Combining the first and 
second order differences yields the best result with respect to F, i.e. E = 3.27 x 10“^ 
for a = 0.16 and (3 = 12.4. 

Two-Dimensional Sphere-Valued Data Example. We define an S^-valued vector- 
field by 

G{t, s) = Rt+sSt-ses, t G [0, 57r], s G [0, 27r], 

( cos 6 — sin 0 0\ / cos 9 0— sin 0\ 

sin 6 cos 6 0 j , S'e := I 0 1 0 j . 

0 0 1/ \sin6* 0 COS0 / 

We sample both dimensions with n = 64 points, and obtain a discrete vector held 
/o G {SY which is illustrated in Fig. 6 (a) the following way: on an equispaced 

grid the point on is drawn as an arrow, where the color emphasizes the elevation 

using the colormap parula from Matlab, cf. Fig. 6(e). Similar to the sphere-valued 
signal, this vector held is affected by Gaussian noise imposed on the tangential plane 

at each point having a standard deviation of a = ^vr. The resulting noisy data / is 

shown in Fig. 6(b). 

We again perform a parameter grid search on to hnd good reconstructions 

of the noisy data, hrst for the denoising with hrst order difference terms (TV). For 
a = 3.5 X 10“^ we obtain the vector held shown in Fig. 6 (c) having E = 0.1879. 
Introducing the complete functional from (3), we obtain setting {3 = 8.6 and a = 0 an 
error of just FI = 0.1394, see Fig. 6(d). Indeed, just using a second order difference 
term yields the best result here. Still, both methods cannot reconstruct the jumps 
along the diagonal lines from the original signal, because they vanish in noise. Only 
the main diagonal jump can roughly been recognized in both cases. 
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(a) Original unit vector field. 



(b) Noisy unit vector field. 




(c) Reconstruction with TVi. 


(d) Reconstruction with TV 2 . 





TT 


0 

— TT 


(e) Colormap illustration: a signal on (left) and drawn 
using arrows and the colormap parula for elevation (right). 


Figure 6. Denoising results of an (a) S^-valued vector field, which is obstructed by (b) Gaussian 
noise on a = ^tt. A reconstruction using (c) TVi approach, a = 3.5 x 10“^, yields 

E = 0.1879 while (d) the reconstruction with TV 2 , a = 0, /3 = 8.6, yields an error of just 
E = 0.1394. 
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(a) Original image. 


(b) Noisy, cr = 0.1. 


(c) TV 1 &TV 2 , 
HSV, vectorial, 





(d) TV 1 &TV 2 , 
RGB vectorial. 


CB, 


(e) TVi, 
channel wise. 


(f) TV 1 &TV 2 , 
CB, channel wise. 


Figure 7. Denoising the “Peppers” image using approaches in various color spaces: (c) On HSV 
with an vectorial approach using a = 0.0625, /3 = 0.125 which yields a PSNR of 28.155, (d) 
on RGB with a = 0.05, /3 = 0.025 yields a PSNR of 31.241, and two approaches channel 
wise on CB, where (e) a TV approach with a = 0.05 results in a PSNR of 28.969 and (f) a 
TV 1 &TV 2 approach, a = 0.024, /3 = 0.022, yields a PSNR of 29.7692. 


Application to Image Denoising. Next we deal with denoising in different color 
spaces. Therefore we take the image “Peppers”^, cf. Fig. 7 (a). This image is distorted 
with Gaussian noise on each of the red, green and blue (RGB) channels with a = 
0.1, cf. Fig. 7 (b). Besides the RGB space, we consider the Hue-Value-Saturation 
(HSV) color space consisting of a S^-valued hue component H and two real valued 
components S,V. For the latter one, there are many methods, e.g., vector valued 
TV. For both, the authors presented a vector-valued first and second order TV-type 
approach in [10, 11]. We compare the vectorial approaches to the Ghromaticity- 
Brightness (CB), where we apply a second order TV on the real-valued brightness 
and the §^-valued chromaticity separately. To be precise, the obtained chromaticity 
values are in the positive octant of Again we search for the best value —here 
with respect to PSNR— of the denoising models at hand on a grid of for the 

available parameters. For the component based approach of CB, both components are 
treated with the same parameters. 


^Taken from the USC-SIPI Image Database, available online at http://sipi.usc.edu/database/database. 
php?volume=misc&image=15 
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(a) Original Data. (b) Noisy Data, 

Rician noise, cr = 0.03. 




(c) Reconstruction with TVi, 
a = 0.1, £; = 0.4088. 


(d) Reconstruction with TVi & TV 2 , 
a = 0.035, a = 0.02, E = 0.4065. 


Figure 8. Denoising an artihcial image of SPD-valued data. 


While for this example already the TV-based approach on the separate channels C 
and B outperforms the HSV vectorial approach, both the TV and the combined first 
and second order approach on CB are outperformed by the vectorial RGB approach. 
The reason for that is, that both channels of brightness and chromaticity in the latter 
model are not coupled. It would be interesting to couple the channels in the CB 
color model in a future work. 


5.2. 'P(3)-valued Images 

An Artificial Matrix-Valued Image. We construct an artificial image of P(3)-valued 
pixels by sampling 


G{s, t) := A(s, t) diag 


^ IT ^x+y,l 
1 -hsTtT 1 I 


s,t G [0,1], 


4 - s-t+ 1 


where A{s,t) = Rx 2 ,x 3 {'n's)Rxux 2 i\^'^s - (K(^ - s - [t - s\) - 7r|), 

1 if a > 6 


Rxi,xj{t) rotation in the Xj,Xj-plane and Safi = 


0 else. 


Despite the outer rotations the diagonal, i.e., the eigenvalues introduce three jumps 
along both center vertical and horizontal lines and along the diagonal, see Fig. 8(a), 
where this function is sampled to obtain an 25x25 matrix valued image / = G 
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■p( 3 ) 25,25 visualize any symmetric positive definite matrix fij by drawing a shifted 

ellipsoid given by the surface niveau {x G — c{i, — c{i, j)'^) = 1} for 

some grid scaling parameter c > 0 . As coloring we use the anisotropy index relative 
to the Riemannian distance [48] normalized onto [0,1), which is also known as the 
geodesic anisotropy index. Together with the hue color map from Matlab both the 
unit matrix yielding a sphere and the case where one eigenvalue dominates by far get 
colored in red. 


Application to DT-MRI. Finally we explore the capabilities of applying the denois- 
ing technique to real world data. The Camino project^ [20] provides a dataset of a 
Diffusion Tensor Magnetic Resonance Image (DT-MRI) of the human head, which is 
freely available.§ From the complete dataset of / = £ ^( 3 ) 112 x 112 x 50 

the traversal plane k = 28, see Fig. 9(a). By combining a first and second order 
model for denoising, the noisy parts are reduced, while constant parts as well as 
basic features are kept, see Fig. 9(b). To see more detail, we focus on the subset 
(i, j) G {28,..., 87} X {24,..., 73}, which is shown in Figs. 9(c) and (d), respectively. 
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A. The Sphere 


We use the parametrization 

x( 0 , (/?) = (cos (/?cos 0 , sin (/?cos 0 , sin 6 *) , 

with north pole zq := (0,0,1)"*" = x(|, (/9o) and south pole (0, 0, —I)"*" = x(—|, (/9o)- Then 
we have for the tangent spaces 


T^(S'^) = r3,(0^^)(S^) := {u G : v'^x = 0 } = span{ei(6', ^9), 62(6*, C/^)} 


with the normed orthogonal vectors ei( 0 , (/?) = || = (cos (/9sin6*, — sin (/9sin0, cos 0)"'" and 
62 ( 0 , v?) = = (—sin(/9, cos(/9,0)"*". The geodesic distance is given by d§ 2 (xi,X 2 ) = 

arccos(xJ'x 2 ), the unit speed geodesic by 7 j;^,j(t) = cos(t)x-|-sin(t)||^, and the expo¬ 
nential map by 


exp,r(t7?) = cos(t||r?|| 2 )x -h sin(t||r/|| 2 ) 


Uee http://cmic.cs.ucl.ac.uk/camino 

^follow the tutorial at http://cmic.cs.ucl.ac.uk/camino//mdex.php?n=Tutorials.DTI 
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(a) Original data. 



(b) Reconstruction with TVi & TV 2 , 
a = 0.01, /3 = 0.05. 



(c) Subset of (a). 



(d) Subset of (b). 


Figure 9. The Camino DT-MRI data of slice 28: (a) the original data, (b) the TV 1 &TV 2 - 
regularized model keeps the main features but smoothes the data. For both data a subset is 
shown in (c) and (d), respectively. 
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Finally, a unit speed geodesic trough x (with 0 > 0) and the north pole zq reads 

lx,ei{t) = COs(t)x + sin(t)ei, Jx,ei{T) = Zo, c{x) = 7x,ei(i), T = ^-9^ 

and the orthogonal frame along this geodesic as Ei{t) = ei { 6 {t),ip), E 2 {t) = 62 {9{t),ip), 
9{t) := 03; + 7 (f ~ 03;) • 

B. The Manifold V{r) of Symmetric Positive Definite 
Matrices 

We provide definitions for the manifold of positive definite symmetric r x r matrices 
'P{r) which are required in our computations, see [64], By Exp and Log we denote 
the matrix exponential and logarithm defined by Expx := Logx := 

p{I — x)<l. The geodesic distance is given by 

_1 _1 

d'p{xi,X2) := ||Log(xi ^X2X^ 2)||. 

Eurther, we have the exponential map exp^(tr]) := X 2 Exp(tx“ 2 r/x“ 2 )x 2 . The unit 
speed geodesic linking x and z for t = T = d-p{x,z) is 

= exp 3 ,(tn) := x^ Exp Log{x~hx~^)^ xE 

where u = X 2 Log(x“ 22 x“ 2 )x 2 /||Log(x“ 22 x“ 2 )||. In particular we obtain 

c(x, ^) = 2 ) ~ (x“2 2 ;x“2 ) 2 x 2 

which is known as the geometric mean of x and z. The parallel transport of 7 G T^V 
along the geodesic = ex.p^{t^) is given by Pt{r]) = x~^r]x~^ 

C. Basics on Parallel Transport 

In this section we review some concepts from differential geometry which were used in 
the paper. Eor more details we refer to [42], Let C°°{Ai) denote the set of smooth 
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real-valued functions on a manifold Ai and C°°{x) the functions defined on some open 
neighborhood of x £ Ai which are smooth at x. Further, let C°°{Ai,N') denote the 
smooth maps from Ai to a manifold N. For computational purposes we introduce 
tangent vectors by their curve realizations. A tangent vector ^ to a manifold Ai at 
X £ Ai is a mapping from to (x) to M such that there exists a curve 7: M —)■ Ad 
with 7(0) = X satisfying 

= := ^/(7(i))L=o- 

The set of all tangent vectors at x £ A4 forms the tangent space T^Ai. Further, 
TAi '■= UxTxAi is the tangent bundle of Ai. Given F £ C°°{Ai,N) and a curve 
7: {—£,£) —> Ai with 7(0) = X and 7(0) = then 

DxF-.TxAi^TF(x)M, i^DxF[i]-.= {Fo^)'{f)) ( 37 ) 

is a linear map between vector spaces, called differential or derivative of F at x. This 
is illustrated in Fig. 10 . 

Let X{Ai) denote the linear space of smooth vector fields on Ai, i.e., of smooth 
mappings from Ai to TAi. On every Riemannian manifold there is the Riemannian 
or Levi-Civita connection 

V: TAi X TAi ^ TAi 

which is uniquely determined by the following properties: 

i) Vfs+geX = fV^X + gVeX for all f,g£C^{Ai) and all E,e,X £ X{M), 

ii) V^iaX + bY) = aV~X + bV^Y for all a,b£R and E,X,Y £ X(Ai), 
hi) Vs(/A:) = E{f)X + fVsX for all / G C^{Ai) and all G X{Ai), 

iv) V is compatible with the Riemannian metric (•,•), i.e., E{X,Y) = y) + 

{X,VeY), 

v) V is symmetric (torsion-free): [H, 0 ] = Vs© — VeH, where [•,•] denotes the Lie 
bracket. 

For some real interval I, a map E: I ^ TAi is called a vector field along a curve 
7: / —;■ Ad if E{t) £ T.^p)Ad for all t £ F Let ^(7) denote the smooth vector fields 
along 7. The Riemannian connection determines for each curve 7: / —Ad a unique 
operator £ : Af (7) —)■ Af (7) with the properties: 

Tl) ^{aE + bQ) = a^E + b^Q for all a, 6 G M, 

T 2 ) £(/H) = /H + /£h for all / G C-(/), 

T 3 ) If E is extendible (to a neighborhood of the image of 7), then for any extension 
“ it holds £“(t) = V^p)“. 

Then ^E is called covariant derivative of S along 7. A vector field S along a curve 
7 is said to be parallel along 7 if 



32 



The fundamental fact about parallel vector fields is that any tangent vector at any 
point on a curve can be uniquely extended to a parallel vector field along the entire 
curve. In this sense we can extend an (orthonormal) basis of TxM parallel 

along a curve 7 and call this a parallel transported (orthonormal) frame 
along 7 . Then any vector field H G ^( 7 ) can be written as S(i) = aj(t)S(f) and 

we obtain by Tl) and T2) that 


D„ 

dt 


D 
dt ^ 


E' 


D 




i=i 


u- 


i=i 


(38) 
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